Protein Sub-Nuclear Localization Prediction Using SVM and Pfam Domain Information
نویسندگان
چکیده
The nucleus is the largest and the highly organized organelle of eukaryotic cells. Within nucleus exist a number of pseudo-compartments, which are not separated by any membrane, yet each of them contains only a specific set of proteins. Understanding protein sub-nuclear localization can hence be an important step towards understanding biological functions of the nucleus. Here we have described a method, SubNucPred developed by us for predicting the sub-nuclear localization of proteins. This method predicts protein localization for 10 different sub-nuclear locations sequentially by combining presence or absence of unique Pfam domain and amino acid composition based SVM model. The prediction accuracy during leave-one-out cross-validation for centromeric proteins was 85.05%, for chromosomal proteins 76.85%, for nuclear speckle proteins 81.27%, for nucleolar proteins 81.79%, for nuclear envelope proteins 79.37%, for nuclear matrix proteins 77.78%, for nucleoplasm proteins 76.98%, for nuclear pore complex proteins 88.89%, for PML body proteins 75.40% and for telomeric proteins it was 83.33%. Comparison with other reported methods showed that SubNucPred performs better than existing methods. A web-server for predicting protein sub-nuclear localization named SubNucPred has been established at http://14.139.227.92/mkumar/subnucpred/. Standalone version of SubNucPred can also be downloaded from the web-server.
منابع مشابه
Prediction of Protein Sub-Mitochondria Locations Using Protein Interaction Networks
Background: Prediction of the protein localization is among the most important issues in the bioinformatics that is used for the prediction of the proteins in the cells and organelles such as mitochondria. In this study, several machine learning algorithms are applied for the prediction of the intracellular protein locations. These algorithms use the features extracted from pro...
متن کاملProtein subcellular localization of fluorescence imagery using spatial and transform domain features
MOTIVATION Subcellular localization of proteins is one of the most significant characteristics of living cells. Prediction of protein subcellular locations is crucial to the understanding of various protein functions. Therefore, an accurate, computationally efficient and reliable prediction system is required. RESULTS In this article, the predictions of various Support Vector Machine (SVM) mo...
متن کاملLocalizome: a server for identifying transmembrane topologies and TM helices of eukaryotic proteins utilizing domain information
The Localizome server predicts the transmembrane (TM) helix number and TM topology of a user-supplied eukaryotic protein and presents the result as an intuitive graphic representation. It utilizes hmmpfam to detect the presence of Pfam domains and a prediction algorithm, Phobius, to predict the TM helices. The results are combined and checked against the TM topology rules stored in a protein do...
متن کاملSupplement for : Domain prediction with proba - bilistic directional context
Pfam 30 (16,306 HMMs) provides the PfamA.full.uniprot file that corresponds to UniProt 2016_02 (46,974,580 proteins). This file was used to obtain dPUC2’s observed family pair counts, CODD’s list of certified domain pairs [1], and DAMA’s domain information and observed architectures [2]. We used the HMMER 3.1b2 version of hmmscan to predict domains (this version is required by Pfam 30). We down...
متن کاملSVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence
Prediction of protein function is of significance in studying biological processes. One approach for function prediction is to classify a protein into functional family. Support vector machine (SVM) is a useful method for such classification, which may involve proteins with diverse sequence distribution. We have developed a web-based software, SVMProt, for SVM classification of a protein into f...
متن کامل